ZL Technologies at TREC 2009 Legal Interactive: Comparing Exclusionary and Investigative Approaches for Electronic Discovery Using the TREC Enron Corpus
نویسندگان
چکیده
Organizations responding to requests to produce electronically stored information (ESI) for litigation today often conduct information retrieval with a limited amount of data that has first been culled by custodian mailboxes, date ranges, or other factors chosen semi-arbitrarily based on legal negotiations or other exogenous factors. The culling process does not necessarily take into account the composition of the data set; and may, in fact, impede the expediency and cost-effectiveness of the eDiscovery process as ESI not initially identified may need to be collected later in the eDiscovery process. This exclusionary eDiscovery approach has been recommended by search and information retrieval technology providers in the past, in part, based on the state of technology available at the time; however, the technology now exists to perform an inclusive, content-based, investigative eDiscovery across a large document collection without the introduction of semiarbitrary exclusion factors. In this paper, we investigate whether limited document retrieval based on custodian email mailboxes results in lower recall and produces fewer responsive documents than a broader, inclusive search process that covers all potential custodians. In order to compare the two approaches, we designed an experiment with two independent teams conducting electronic discovery using the different approaches. We found that searching across the entire data set resulted in finding significantly more responsive documents and more initial custodians than implementing an approach that relies on custodian-based culling. Specifically, investigative eDiscovery found 516% more relevant documents and 1825% more initial custodians in our study. Based on these results, we believe organizations that employ an exclusionary, culling-based methodology may require subsequent collections, risk under production and sanctions during litigation, and will ultimately expend more resources in responding to eDiscovery production requests with a less comprehensive result.
منابع مشابه
TREC 2009 at the University of Buffalo: Interactive Legal E-Discovery With Enron Emails
For the TREC 2009, the team from University at Buffalo, the State University of New York participated in the Legal E-Discovery track, working on the interactive search task. We explored indexing and searching at both the record level and the document level with the Enron email collection. We studied the usefulness of fielded search and document presentation features such as clustering documents...
متن کاملDiscovery of Related Terms in a corpus using Reflective Random Indexing
A significant challenge in electronic discovery is the ability to retrieve relevant documents from a corpus of unstructured text containing emails and other written forms of human-to-human communications. For such tasks, recall suffers greatly since it is difficult to anticipate all variations of a traditional keyword search that an individual may employ to describe an event, entity or item of ...
متن کاملA Model for Understanding Collaborative Information Behavior in E-Discovery
The University of Pittsburgh team participated in the interactive task of Legal Track in TREC 2009. We designed an experiment to investigate into the collaborative information behavior (CIB) of the group of people working on e-discovery tasks provided by Legal Track in TREC 2009. Through the studies, we proposed a model for understanding CIB in e-discovery.
متن کاملExperiments with the Negotiated Boolean Queries of the TREC 2009 Legal Track
For our participation in the Batch Task of the TREC 2009 Legal Track, we produced several retrieval sets to compare experimental Boolean, vector, fusion and relevance feedback techniques for e-Discovery requests. In this paper, we have reported not just the mean scores of the experimental approaches but also the largest per-topic impacts of the techniques for several measures. The experimental ...
متن کاملLearning Task Experiments in the TREC 2011 Legal Track
The Learning Task of the TREC 2011 Legal Track investigated the effectiveness of e-Discovery search techniques at selecting training examples and learning from them to estimate the probability of relevance of every document in a collection. The task specified 3 test topics, each of which included a one-sentence request for documents to produce from a target collection of 685,592 e-mail messages...
متن کامل